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We have previously observed that all known potent broadly neutralizing antibodies (bnAbs) 
against HIV-1 are highly divergent from their putative germline predecessors in contrast to 
bnAbs against viruses causing acute infections such as henipaviruses and SARS CoV, which 
are much less divergent from their germline counterparts. Consequently, we have hypoth- 
esized that germline antibodies may not bind to the HIV-1 envelope glycoprotein (Env) 
because they are so different compared to the highly somatically mutated HIV-1-specific 
bnAbs. We have further hypothesized that the immunogenicity of highly conserved epi- 
topes on the HIV-1 envelope glycoproteins (Envs) may be reduced or eliminated by their 
very weak or absent interactions with germline antibodies and immune responses leading 
to the elicitation of bnAbs may not be initiated and/or sustained. Even if such responses are 
initiated, the maturation pathways are so extraordinarily complex that prolonged periods 
of time may be required for elicitation of bnAbs with defined unique sequences. We pro- 
vided the initial evidence supporting this antibody germline/maturation hypothesis, which 
prompted a number of studies to design vaccine immunogens that could bind putative 
germline predecessors of known bnAbs and to explore complex B cell lineages. However, 
guiding the immune system through the exceptionally complex antibody maturation path- 
ways to elicit known bnAbs remains a major challenge. Here, we discuss studies exploring 
the antibody germline/maturation hypothesis as related to elicitation of bnAbs against HIV- 
1 and present our recent data demonstrating the existence of germline-like precursors of 
VRC01 antibodies in a human cord blood IgM library. 
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INTRODUCTION 

Elicitation of broadly neutralizing antibodies (bnAbs) targeting 
the HIV-1 envelope glycoproteins (Envs), the key to an effective 
HIV-1 vaccine, remains elusive. Previous studies have demon- 
strated several properties of the HIV- 1 Envs that could limit their 
ability to elicit bnAbs. These include protection of the conserved 
structures by variable loops (1-3), remarkable genetic diversity 
(4), a glycan shield (5), steric occlusion (6), and conformational 
masking (7). Until 6 years ago, only a handful of bnAbs, including 
bl2, 2G12, 2F5, and 4E10, were known. Although the structural 
and functional studies of those bnAbs revealed some important 
neutralization epitopes (8), such bnAbs have not been successfully 
elicited by any vaccination approach. 

In 2007, we first noted that in HIV-1 specific bnAbs the num- 
ber of amino acid mutations from their closest corresponding 
germline sequences was significantly higher than that of bnAbs 
against the SARS CoV coronavirus, and Hendra and Nipah viruses, 
which cause self-limiting acute infections (9). Using a large non- 
immune IgM library, we identified several HIV-1 Env specific 
antibodies and found that they had fewer somatic mutations than 



the HIV-1 bnAbs, as well as limited neutralizing activity (9). These 
findings indicated that elicitation of HIV- 1 bnAbs would require 
far more extensive maturation processes than those needed to gen- 
erate the bnAbs against the SARS CoV and henipaviruses. So, we 
have suggested that the difficulty of eliciting these bnAbs may be 
due, at least in part, to the complex and prolonged maturation 
pathways required for the development of bnAbs against the HIV- 
1, which can take long time (10). We thus speculated that this 
may represent another significant challenge in the development of 
effective HIV-1 vaccines. 

We quantified the number of mutations in human monoclonal 
antibodies (mAbs) that we selected from phage libraries generated 
from an HIV-1 -infected patient with a known time of infection 
(10). We calculated the number of amino acid mutations per heavy 
chain V gene, and defined it as antibody somatic mutational diver- 
sification (ASMD). We compared the extent and dynamics of the 
ASMD between HIV-1 -specific mAbs and a panel of SARS CoV- 
specific mAbs. Our experiments based on the ASMD predicted that 
elicitation of HIV- 1 -specific bnAbs would take at least 3 years. An 
illustrative mathematical model using the ASMD rate based on 
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an exponential time dependent function suggested that a much 
longer time would be needed for the required maturation, unless 
somatic diversification had already been initiated from an inter- 
mediate antibody. Thus, all these initial studies corroborated our 
hypothesis that the infrequent occurrence or absence of bnAbs 
in HIV-1 -infected patients could be due, at least in part, to the 
lack or limited availability of B cell receptors that rapidly mature 
into bnAbs. Therefore, we suggested that appropriate immuniza- 
tion protocols of long duration need to be developed using the 
knowledge gained from the exploration of antibody maturation 
pathways in humans (10). 

From the striking observation that all known potent HIV-1 
bnAbs are highly divergent from their putative germline prede- 
cessors in contrast to bnAbs against henipaviruses and SARS CoV 
coronavirus, we hypothesized that, since the germline antibod- 
ies are so different compared to the highly somatically mutated 
HIV-1 bnAbs, they may not bind to the Env. This led us to 
the hypothesis that the immunogenicity of the highly conserved 
epitopes on the HIV-1 native envelope glycoproteins (Envs) is 
reduced or eliminated by their very weak or absent interactions 
with germline antibodies, which could not initiate and/or sus- 
tain immune responses leading to elicitation of bnAbs: even if 
immune responses are initiated and sustained, the maturation 
pathways are so complex that help and long times may be needed 
for their elicitation. To test our antibody germline/maturation 
hypothesis, we designed germline-like antibodies corresponding 
to the known bnAbs bl2, 2G12, 2F5, X5, m44, and m46 (the 
latter three antibodies were discovered in our laboratory and pos- 
sess HIV-1 cross-reactivity with moderate neutralizing activities) 
and evaluated them for binding to Envs (11). We found that 
while germline-like X5, m44, and m46 bound to Envs with rel- 
atively high affinity, the germline-like precursors of bl2, 2G12, 
and 2F5 failed to bind Envs in an ELISA assay although their 
corresponding mature bnAbs bound strongly. These results pro- 
vided initial evidence that the Env structures containing conserved 
epitopes might not initiate humoral responses due to limited 
or absent binding to the germline precursors of bnAbs. These 
germline precursors may also be of limited availability as recently 
reported (12). 

Following that initial study, we expanded our investigation 
to different variants of the two different antibodies (bl2 and 
X5) including their closest germline counterparts and several 
germline-like intermediates (13). The experiments showed that 
bl2 intermediate antibodies neutralized only some HIV-1 iso- 
lates with relatively weak potency. In contrast, intermediates of 
X5 neutralized a subset of the tested HIV-1 isolates with efficien- 
cies comparable to those of the matured X5. These results helped 
explain the relatively high immunogenicity of the coreceptor bind- 
ing site on gpl20 and the abundance of CD4-induced (CD4i) 
antibodies in HIV-1 -infected patients (X5 is a CD4i antibody) as 
well as the maturation pathway of X5. In the case of bl2, germline- 
like intermediates along the maturation pathway were shown to 
not only bind some Envs but also human self antigens, suggesting 
that antigens other than the Envs could help guide the immune 
system through the bl2 maturation pathway. 

Therefore, we proposed a conceptually new vaccination 
approach, in which it is critical to identify primary immunogens 



that bind to the germline antibodies that are predecessors 
of bnAbs. If needed, these immunogens should be combined 
with secondary immunogens that recognize intermediate and/or 
matured antibodies to guide the immune system through the pro- 
longed, complex maturation pathways (14). In this respect, we 
envisioned that the knowledge of human antibodyomes would 
become indispensable to elucidate the origin, diversity, and mat- 
uration pathways of bnAbs and discover germline-like intermedi- 
ates of bnAbs that could provide a basis for the design of novel 
HIV-1 vaccine immunogens (14, 15). 

In recent years, several groups have reported a number of new 
bnAbs that were identified from multiple HIV- 1 infected individ- 
uals using designed novel antigen baits and advanced technolo- 
gies implemented in isolating human mAbs and high-throughput 
sequencing (16). Particularly, Haynes, Kwong, Stamatatos, Scheif 
et al. have dealt with a large amount of data delineating struc- 
tural, genetic determinants, and maturation pathways of different 
bnAbs. These studies not only confirmed our previous findings 
that the Envs fail to engage germline versions of bnAbs but also 
suggested possible holes in B cell repertoires and demonstrated 
the implications of our antibody germline/maturation hypothe- 
sis for finding germline-like precursors, intermediates as well as 
for designing immunogens that could potentially bind to such 
bnAb intermediates. In this report, we discuss the recent advance- 
ments in HIV-1 vaccine research in the context of the antibody 
germline/maturation hypothesis, and highlight critical factors to 
be considered when exploring germline-like precursors and inter- 
mediates of bnAbs. We also report for the first time using 454 
sequencing data analysis of a human cord blood IgM library 
to identify putative germline precursors of the heavy and light 
chains of VRCOl-like antibodies. These naturally occurring cord 
blood-derived VRCOl-like heavy and light chains may be useful 
as putative templates for designing novel vaccine immunogens 
that can lead to the elicitation of VRCOl-like antibodies and for 
understanding the maturation pathways of this bnAb. Still there 
are major challenges to be overcome. New empirical and semi- 
empirical approaches could be successful; recently, new paradigms 
were discussed that could better fit our increased knowledge of 
HIV immunopathology and which could possibly be more help- 
ful in guiding future vaccine research than did past unsuccessful 
approaches (17). 

MATERIALS AND METHODS 
ANTIB0DY0ME DATABASE AND TOOLS 

DNA isolation, amplification, and 454 sequencing of the human 
cord blood IgM library were previously described in detail (18, 
19). For quality control, we trimmed the 454 sequence reads and 
retained only sequences with lengths of more than 300 nucleotides, 
covering the entire antibody variable domains consisting of all 
three complementarity determining regions (CDRs) along with 
framework regions (FRs). We used IMGT/HighV-QUEST for 
immunogenetic and statistical analyses (20). The output results 
from the IMGT/HighV-QUEST analysis were stored in a local 
PostgreSQL database, and structured query language (SQL) was 
used to retrieve the data for further analysis. Statistical calculations 
were carried out using IMP 10® statistical software (SAS Institute, 
Cary,NC,USA). 
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COMPUTATIONAL ANALYSIS OF ANTIBODY SEQUENCES 

Antibody sequences from IGHV1-2 and IGK3-11 lineages were 
retrieved from our local antibodyome database consisting of 
immunogenetic data derived from 454 sequencing of the human 
cord blood IgM library using SQL statements. Amino acid 
sequence identities between each of the selected lineage sequences 
from the 454 sequence data and pertinent germline sequences were 
calculated based on the pairwise alignment using local BLAST 
as implemented in BioEdit v7.0.9 (21). Phylogenetic analysis was 
carried out using the Archaeopteryx software (22). 

RESULTS AND DISCUSSION 

EXPLORING THE ANTIBODY GERMLINE/MATURATION HYPOTHESIS 

Our earlier observation of the extensive maturation of HIV-1 
bnAbs in contrast to those against some viruses causing acute 
infections led to the antibody germline/maturation hypothesis (9- 
11, 13, 14). According to this hypothesis, it is critical to identify 
immunogens that would bind to germline and/or intermediate 
antibodies of bnAbs, as well as the exploration of antibodyomes 
could be useful for identifying such immunogens (14). Figure 1 



describes the timeline involving some of the key developments in 
current HIV- 1 vaccine research focused on antibody germline-like 
intermediates and maturation pathways of bnAbs. Major research 
efforts in this direction were spearheaded by deep sequencing and 
structural biology studies of VRCOl-like and other CD4-binding 
site (CD4b) antibodies from HIV-1 -infected individuals. These 
studies delineated possible maturation pathways of such antibod- 
ies with high levels of somatic mutations and convergence in anti- 
body recognition (23, 24). Both studies revealed that the putative 
germline precursors of these antibodies had weak or no appar- 
ent affinity for Env, and acquisition of a large number of somatic 
mutations were needed for the breadth and potency of these anti- 
bodies. These studies also explored antibody diversity and found 
many intermediates of similar lineages of the heavy chain genes 
from the two IGHV families VH 1 -2 and VH 1 -46 that paired with 
different light chain genes. Thus, analysis of the VRC01 -related 
antibodyome from HIV- 1 infected patients revealed B cell matura- 
tion pathways that may help guide the vaccine-induced elicitation 
of such antibodies. However, if we could find germline-like inter- 
mediates of such bnAbs from a naive antibody repertoire, then 



BnAbs are highly divergent from their 
germlines and their elicitation may 
require extended period of time. (Chen 
et al. AIDS Vaccine 2007; Chen et al. 
AIDS Res. Hum. Retroviruses 2008) 
Germline BCRs do not or weakly bind 
to Env and implications for immune 
evasion and HIV-1 vaccine (Xiao et al. 
Biochem BiophysRes Commun. 2009) 
Maturation pathways of bnAbs and their 
intermediates (Xiao ef al. Viruses 2009) 
Antibodyome and hypothesis of 
elicitation of bnAbs by primary and 
secondary immunogens (Dimitrov DS. 
mAbs 2010) 




Co-evolution of CH103 
antibody and founder 
virus - germline and 
intermediates of 
CH103bind to the Env 
(Liao ef al. Nature 
2013) 

Mining the donors' 
antibodyomes for HIV- 
bnAbs, VRC01-class 
and 10E8 antibodies 
(Zhou ef al. Immunity 
2013; Zhu et al. Proc 
Natl Acad Sci U S A. 
2013) 



Lack of B-cell activation by the 
germline BCRs and engineering 
of Env to activate germline BCRs 
(Hoot et al. PLoS Pathog. 20 1 3; 
McGuire et al. J Virol. 2014 and 
J Exp Med. 2013) 
Immunogen design to target 
VRC01-class germline 
precursors (Jardine ef al. 
Science 2013) 
Role of somatic 
mutations/insertion in the 
framework regions of bnAbs 
(Klein et al. Ce//2013) 



Initial observations 

and evidences, 
germline/maturation 
hypothesis for bnAbs 

2007-2010 



Maturation pathways of 

VRC01-like and 
other CD4bs antibodies 



Envs to activate germline 

BCRs and 
role of somatic mutations 



Challenges in the 
HIV-1 vaccine 
development 




2011-2012 



Deep sequencing and structural biology of 
VRC01-like antibodies (Wu ef al. Science 
2011) 

Sequence convergence among highly 
active agonistic CD4bs antibodies (Scheid 
ef al. Science 2011) 

Deep sequencing of human naive antibody 
libraries revealed germline intermediates of 
bnAbs (Prabakaran ef al. Front Microbiol. 
2012; Larimore ef al. J Immunol. 2012) 




2013-2014 



Maturation and 
intermediates of bnAbs 
from PGT121 family 
(Sok et al. PLoS 
Pathog. 2013) 
Developmental pathway 
forV1V2-directed 
CAP256-VRC26 
antibodies (Doria-Rose 
ef al. Nature 2014) 




Identification of germline 
intermediates with minimal 
number of mutations 
Discovering appropriate Env 
immunogens that could bind to 
those intermediates and activate 
BCRs 

Deeper understanding of B cell 
biology 



FIGURE 1 | Timeline of the antibody germline/maturation hypothesis and experiments 
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potential vaccine immunogens developed based on those tem- 
plates would stimulate an adequate B cell immune response in 
healthy humans. To this end, we identified VRCO 1 -like intermedi- 
ate antibodies from a naive antibody library of human cord blood, 
which is presented later in the text. 

We previously analyzed the IgM repertoires of healthy individ- 
uals and identified several intermediates of bl2 from the VH1-3 
gene family (15). Sequence analysis of 28,925 unique sequences 
from the IgM repertoires revealed a CDRH3 with a length (20 
amino acids) and sequence similar (50%) to that of the bl2 
CDRH3, but the V gene associated with that CDRH3 was found to 
be HV4-b (15). This finding indicates that long CDRH3s may not 
be a limiting factor for the development of bnAbs (25) although 
long CDRH3 motifs with certain amino acid preferences and/or 
associations with particular heavy or light chain families favoring 
polyreactivity may not be undermined. 

Stamatatos and coworkers have conducted experiments screen- 
ing a large panel of recombinant Envs for binding to the germline 
predecessors of bl2, NIH45-46, and 3BNC60 to test how Env 
immunogens interact with the predicted germline versions of 
known bnAbs (26). They found that the mature bnAbs reacted 
with diverse Envs but the corresponding germline antibodies did 
not. They examined in detail the germline bl2 and its chimeric 
forms - either the germline heavy chain paired with the mature 
light chain and vice versa - to test whether they could interact 
with any of the recombinant Envs derived from clade A, B, and C 
viruses. Among all the recombinant Envs tested, at least one Env 
(QH0692) was found to bind a bl2 chimera with a mature heavy 
chain. However, this chimera failed to mediate calcium mobiliza- 
tion, indicating no BCR activation via BCR-antigen engagement. 
In other studies, they found that the elimination of certain con- 
served glycosylation sites on Envs led to the binding of germline 
versions of VRC01 and NIH45-46 and BCR activation (27) but 
that the modified Envs did not interact with PG9 and 447-52D 
germlines (28). 

Haynes and coworkers have succeeded in finding Envs capa- 
ble of engaging the germline versions of a CD4bs bnAb, CHI 03, 
while studying the co-evolution of the antibody in an HIV-1 
infected patient (29). They found that CH103 is less mutated 
than most other CD4bs bnAbs, and importantly that the unmu- 
tated common ancestor of the CH103 lineage avidly bound the 
transmitted/founder HIV-1 Envs. This finding suggests that early 
founder Envs could bind optimally to the germline and inter- 
mediate versions of CHI 03, and therefore, are promising vaccine 
immunogens, representing an important step forward in HIV-1 
vaccine development. 

Similarly, the maturation pathway of the potent VI V2-directed 
HIV-neutralizing antibody, CAP256-VRC26, has been described, 
in which a germline-like intermediate with a 35-amino acid residue 
long CDRH3 was shown to bind and neutralize the superinfecting 
virus weakly, but did not bind or neutralize heterologous viruses 
(30). These results suggest that the CAP256-VRC26 lineage could 
be initiated by using a rare superinfecting-virus-like V1V2 Env. 

In another successful effort in identifying an Env that could 
engage the germline versions of bnAbs, Scheif and coworkers 
devised a computation-guided approach combined with in vitro 
screening to engineer a gpl20 outer domain. The designed protein 



not only bound to multiple VRCO 1 -class bnAbs and their germline 
precursors but also activated B cells expressing diverse intermedi- 
ates of the bnAbs (31). Therefore, priming with the protein and 
subsequent boosting with more native immunogens could help 
induce early somatic mutations and the ultimate elicitation of 
VRCO 1 -class bnAbs. 

Interestingly, Nussenzweig and coworkers' study showed that 
somatic mutations of the FRs and insertions of some bnAbs are 
required for their broad and potent HIV-1 neutralizing activ- 
ity (32). Based on structural information, they made different 
germline versions of VRC01, NIH45-46, 12A21, and 3BNC117, 
and found that mutations in FRs were also essential for binding, 
breadth, and potency of most bnAbs. This suggested that certain 
framework mutations could be critical and should be preserved for 
designing the intermediates of such bnAbs. Several other studies 
mining the HIV- 1 infected donors' antibodyomes (33-35) revealed 
putative intermediates of bnAbs. Many of them with lower levels 
of somatic hyper mutations could bind to selective Envs; for exam- 
ple, intermediates of PGT121-134 were able to preferentially bind 
native Envs relative to monomeric gpl20 (36). We also identified 
2F5-like antibodies (m66 and m66.6) with much fewer mutations 
than 2F5 and suggested their use as a model system for elicitation 
of such antibodies (37, 38). 

All these newly discovered bnAbs raise the hopes for effective 
HIV- 1 vaccine development as they reveal characteristic features 
of bnAbs that could help us understand the immunological basis 
critical for their production and also serve as templates for ratio- 
nal vaccine design. Therefore, the focus has been dramatically 
shifted to explore and overcome the immunological hurdles asso- 
ciated with the elicitation of bnAbs, namely, extensive somatic 
mutations of bnAbs. Major challenges remain in identification of 
intermediates with a minimal number of mutations, and appro- 
priate Env immunogens that would bind such intermediates and 
activate BCRs, which can lead to the maturation of the interme- 
diate antibodies to bnAbs. Recently, new paradigms that better fit 
our increased knowledge of HIV immunopathology and which 
may be more helpful in guiding future vaccine research than did 
past unsuccessful approaches were discussed (17). 

IDENTIFICATION OF PUTATIVE GERMLINE-LIKE INTERMEDIATES IN THE 
MATURATION PATHWAYS OF VRC01 

We previously characterized the human cord blood cell-derived 
IgM antibodies using 454 sequencing to study gene diversity and 
somatic mutations (19). Naive germline antibody repertoires, par- 
ticularly from babies, may be quite unique for understanding the 
B cell maturation pathways, as they can also mount an immune 
response against HIV-1 as recently found (39). Our earlier gene 
usage analysis of the cord blood IgM repertoire showed the biased 
IGHV gene usages (19) as similar to adult IgM repertoires (40). 
However, we already noted that the IGHV1-2 gene usage was sig- 
nificantly higher in the cord blood IgM repertoire, i.e., an overall 
contribution of 20% as compared to 8% in adult IgM repertoires. 
This suggested that the cord blood IgM repertoire may be advan- 
tageous for the exploration of the IGHVl-2*02 lineages when 
studying germline precursors and intermediates of VRCO 1 heavy 
chain. A total of 5,624 heavy chain and 1,096 light chain sequences 
of IGHV1-2 and IGKV3-11 lineages, respectively, were used to 
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FIGURE 2 | Phylogram and sequence alignment of selected (A) heavy 
and (B) light chain sequences found in human cord blood IgM libraries 
of IGHV1-2 and IGKV3-11 lineages, respectively Phylogenetic trees 
constructed using the amino acid sequences of top 10 putative germline-like 
precursors as found in the cord blood library along with mature heavy and 



select the top 10 sequences as closest intermediates for VRC01 in 
each heavy and light chain categories by using local BLAST search- 
ing. We performed phylogenetic analysis of the selected sequences 
to identify genetic relationships among VRC01 -like intermediates 
of heavy (Figure 2A) and light (Figure 2B) chains. We found two 
of the antibody heavy chains, HWAV6 and JHEDT, which were 
100% identical to the IGHVl-2*02 germline sequence. Remark- 
ably, their CDRH3 sequences had the same length (14 amino 
acids) as that of the VRC01 heavy chain. For these 10 heavy chain 
sequences, the CDRH3 lengths ranged from 8 to 16 amino acids 
with sequence variations at the junctions. One of the germline 
sequences, JHEDT, had a point mutation at CyslOOTyr (Kabat 
numbering) of CDRH3 that exactly mimicked the residue TyrlOO 
of CDRH3 in VRC01. The residue TyrlOO at CDRH3 of VRC01 



light chains of VRC01 are shown. Multiple sequence alignments of heavy 
and light chain sequences along with their corresponding germline 
sequences, V1-2*02 and KV3-11 *01, made using BLOSUM62 scoring 
matrices, conserved substitution (blue) and non-conserved substitution 
(white) are shown. 



is most likely contributed by the IGHD3- 16*02 germline with a 
point mutation CyslOOTyr. The other heavy chain sequence I76AT, 
which was the closest to VRC01 heavy chain, also had the same 
mutation at CyslOOTyr. One of the other germline sequences, 
HWAV6, had TrplOOB (Kabat numbering) of CDRH3 that exactly 
mimicked the residue TrplOOB of CDRH3 in VRC01. Intriguingly, 
the TrplOOB residue is a junctional amino acid of the CDRH3 in 
germline HWAV6, and it exactly replicates the TrplOOB junctional 
residue of CDRH3 in VRC01. This suggests a possible maturation 
mechanism involved in the VRC0 1 -like intermediates where junc- 
tional amino acids could determine the maturation pathway far 
preceding the somatic hypermutation required for affinity matu- 
ration (41). Most of the closest IGHV genes, 8 out of 10 shown 
in the Figure 2A, have at least one mutation in the V region, and 
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two sequences, G2W0T and GD60C, have two mutations at each 
of the CDRH1. The pre-existing amino acid mutations found in 
the V region and CDRH3 sequence information may inform the 
design of heavy chain germline-like precursors and intermediates, 
and help naturally reconstruct the B cell clonal lineages in the 
maturation pathways of VRC01. 

Light chain recognition of Envs by VRC01 and VRC01 -related 
antibodies has been studied in detail using structural and 454 
sequencing data (33). The VRC01 light chain commonly uses the 
IGKV1-33 lineage and has a characteristic five amino acid long 
CDRL3 and a distinctive two amino acid deletion in CDR LI. 
Therefore, we selected the IGKV1-33 lineage sequences with five 
amino acid length CDRL3s, but no sequences were found with a 
two amino acid deletion in CDR LI (Figure 2B). All of them had 
either framework or CDR mutations or both. Four of them had a 
point mutation at CDRL1 and seven of them had a point mutation 
at CDRL3. 

The structural basis for germline gene usage of VRC01 -related 
antibodies targeting the CD4bs has been previously described (42), 
which revealed a set of signature features for these antibodies that 
were verified by mutagenesis. These signature features explained 
the origin of the IGVH1-2 gene and antibody resistance for some 
Env sequences. We found that characteristic residues including 



the TrplOOB of heavy chains were conserved while light chains did 
not have any characteristic residues as reported previously (42). 
However, other pre-existing amino acid mutations in light chains 
could have implications for the VRC01 -related intermediates with 
a characteristic CDRL3 of five amino acid length. 

DISTRIBUTIONS OF CDR LENGTHS AND AMINO ACIDS IN THE 
VRC01 -RELATED GERMLINE GENES 

We analyzed the amino acid length distributions of CDRH3 and 
CDRL3 sequences that were of VRC01 origins, namely, IGHV1-2 
and IGKV3-11 for heavy and light chains, respectively, as derived 
from the human cord blood IgM library (Figure 3). The CDRH3 
lengths ranged from 4 to 27 amino acids, indicating high CDR3 
length diversity (Figure 3A). VRC01 has a CDRH3 length of 14 
amino acids, which is shorter than those of most other anti-HIV- 1 
antibodies (25). The LCDR3 lengths ranged from 4 to 14 amino 
acids (Figure 3B ) . The CDRL3 of VRCO 1 has a characteristic length 
of five amino acids with a mature genetic signature (33). Analysis 
of the human cord blood IgM repertoire showed only a fraction 
of such light chains with a shorter length of five amino acids 
(Figure 3B). Frequency distributions of amino acid compositions 
of CDRH1 and CDRH2 in IGHV1-2 sequences as observed in 
the human cord blood IgM repertoire are plotted in Figures 3C,D, 
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FIGURE 3 | Distributions of amino acid length in (A) CDRH3, (B) CDRL3 and amino acid compositions in (C) CDRH1 and (D) CDRH2 as derived using 
the gene families of IGHV1-2 and IGKV3-11 in human cord blood IgM library showing the CDR3 lengths diversity and pre-existing mutations at 
CDRH1 and H2 
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respectively. These plots show that there are position specific varia- 
tions in the CDRH1 and CDRH2 regions of IGHV1-2 genes. These 
could indicate possible IGHV1-2 specific pre-existing amino acid 
mutations in CDRH1 and CDRH2, as observed in several naive 
antibody heavy chain sequences, which could inform the design of 
germline precursors and intermediates of VRCOl-like antibodies. 

V-D-J RECOMBINATION DIVERSITY AND IGHD READING FRAME 
USAGES IN THE VRC01 -RELATED GERMLINE GENES 

We previously observed that the V-D-J rearrangement patterns 
occurred at different frequencies with 1,430 V-D-J combinations 
in a human cord blood IgM repertoire (19). Figure 4A shows the 
V-D-J diversity associated with IGHV1-2 gene sequences using 
a bubble plot for comparison with different D and J genes. The 
VRCO 1 heavy chain uses IGHD3-16 and IGHJ2 genes to recombine 
with IGHV1-2. However, other VRCO 1 -related antibodies exhibit 
a skewed usage of IGHJ genes although at least three different IGHJ 
genes (IGHJ1, IGHJ2, and IGHJ4) are involved (23). As the human 
cord blood IgM library has a large functional V-D-J diversity, it 
can be used to identify potential VRCO 1 -like heavy chain germline 
precursors and intermediates. 

In jawed vertebrates the expressed heavy chains may use any of 
the six IGHD reading frames (RFs); however, RF1 is thought to be 
the preferred one as it mostly encodes tyrosine and glycine. The 
remaining five RFs encode either hydrophobic or charged amino 
acids, but the use of inverted RF1, RF2, and RF3 are discour- 
aged. Preferential usage of IGHD RFs has been long implicated 
in B cell development and antigen-specific antibody production 
(43-45), and selected based upon its amino acid content (46). 
Genetic control of IGDH RF preference over the regulation of 
repertoire development has been recognized (47). Here, we have 
analyzed the productive IGHD RF usages in a human cord blood 
IgM library. Frequency distribution of RFs is plotted using a pie 
chart as depicted in Figure 4B. We noted that there were not any 
highly restricted usages of the IGHD RFs although some preferen- 
tial usages depending on the IGHD genes were found. This clearly 
indicates that IGHD RFs diversity could add more diverse amino 
acid contents leading to enormous CDRH3 diversity. It may also be 



possible that intermediates with different RF choices play a critical 
role in selecting certain maturation pathways efficiently. 

CONCLUSION 

The antibody germline/maturation hypothesis led to a paradigm 
shift in the design of immunogens for bnAb elicitation, as well as 
the realization of the importance of the complexity of the bnAb 
maturation pathways, and exploration of human antibodyomes 
(14). In fact, human antibodyome exploration is also promising 
for other fields of science and medicine (14, 48). This antibody- 
ome approach is now a major direction of research in the HIV-1 
vaccine field (16, 49). An important goal is to precisely identify 
naturally occurring germline-like precursors and intermediates of 
bnAbs that could help designing novel immunogens, which could 
activate the corresponding BCRs and drive the immune system to 
produce bnAbs within a short period of time. We presented an 
approach using a human cord blood IgM library to identify puta- 
tive germline precursors and intermediates of VRCOl-like heavy 
and light chains, which could be useful in reconstructing the B 
cell clonal lineages in the maturation pathways of VRCO 1 -related 
bnAbs. This method has the potential to help in the identification 
of naturally occurring germline-like precursors and intermediates 
of any known bnAb and in the development immunogens based 
on HIV-1 Envs (50) and peptides (51), as well as non-HIV-1 mole- 
cules (12). However, major challenges remain and new paradigms 
that better fit our increased knowledge of HIV immunopathology 
could possibly be more helpful in guiding future vaccine research 
than did past unsuccessful approaches (17). 
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